Reveal the daily routines of two selected participant of the city of Engagement, Ohio USA.
This take-home exercise aims to reveal the daily life of the participants of the city of Engagement, Ohio USA by using visualization techniques in R.
This challenge considers the patterns of daily life throughout the city. To describe the daily routines for some representative people, characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time and seasons.
Assume the participants have given permission to have their daily routines captured. Choose two different participants with different routines and describe their daily patterns, with supporting evidence. Limit your response to 10 images and 500 words.
We consider participants with ID 4 and 486 for the study who have stark differences in their age, joviality, education and household.
Links to the dataset:
Buildings.csv
TravelJournal.csv
ParticipantStatusLogs10.csv
Packages, namely tidyverse and ggplot2and
ViSiElse are required for this exercise. This code chunk installs the
required packages and loads them onto RStudio environment.
packages = c('tidyverse','ggplot2','ggdist', 'ggridges','patchwork', 'ggthemes','hrbrthemes','ggrepel','ggforce',"HH","vcd",'scales','grid','gridExtra','formattable','readr', 'ggiraph', 'plotly', 'DT', 'gganimate','readxl','gifski','gapminder','treemap','treemapify','rPackedBar','ggstatsplot','ggside','broom','crosstalk','ViSiElse','zoo', 'lubridate', 'remotes', 'trelliscopejs','data.table','sf','tmap','sf','clock','sftime','rmarkdown')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The dataset used in this exercise is Participants.csv, published by the IEEE for [VAST challenge 2022] (https://vast-challenge.github.io/2022/)
The code chunk below imports Buildings.csv ,
TravelJournal.csv, ParticipantStatusLogs10.csv from
the data folder into R by using read_csv()
function of readr and saves it as Tibble data frame
called buildings, travel, logs
The TravelJournal.csv contains information about participants’ motivation for movement around the city.
travel <- read_csv("data/TravelJournal.csv")
summary(travel)
buildings <- read_sf("data/Buildings.csv",
options = "GEOM_POSSIBLE_NAMES=location")
logs <- read_csv("data/ParticipantStatusLogs10.csv")
The Travel Journal contains financial transactions by a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home. The checkin journal contains checkin details into a Restaurant, Pub, Apartment or Workplace
We filter out the records related participant 4 and 486 We choose participant 4 for analysis as participant 4 - has kids in his household, has a household size of 3, is aged 43, has completed his/her bachelors and has a high joviality of 0.85
As compared to participant 4 participant 486 - has no lids in household, has a household size of 1, is aged 29, has low education level and a low joviality of 0.02
We aim to find the lifestyle patterns and daily routine of both these participants.
Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal gives us the amount spent by the participant at a particular location.
travel_filt$amountSpent <- travel_filt$endingBalance -travel_filt$startingBalance
We use weekdays(), day(), month(), year() functions to extract the day of the week, date, moth and year of checkin to perform time series visualizations.
Calculate travel time as the difference between the travel start time and the travel end time and calculate the time spent as the difference of check in and check out times.
data_travel= travel_filt %>%
mutate(weekday = weekdays(checkInTime),
day = day(checkInTime),
month=as.character(checkInTime,"%b %y"),
year = year(checkInTime),
monthYear = floor_date(checkInTime, "month"),
travelEndLocationId=as.character(travelEndLocationId),
timeSpent = checkOutTime - checkInTime,
travelTime = travelEndTime- travelStartTime,
participantId=as.character(participantId),
purpose=as.character(purpose))
data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel$travelTime <- as.numeric(as.character(data_travel$travelTime))
data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "checkInTime", "amountSpent","timeSpent","travelTime","weekday","day","month","year","monthYear")]
# A tibble: 6 × 13
participantId travelStartLocationId travelEndLocationId purpose
<chr> <dbl> <chr> <chr>
1 4 194 411 Work/Home C…
2 4 411 448 Eating
3 486 945 1311 Work/Home C…
4 486 1311 1345 Eating
5 4 448 411 Coming Back…
6 486 1345 1311 Coming Back…
# … with 9 more variables: checkInTime <dttm>, amountSpent <dbl>,
# timeSpent <dbl>, travelTime <dbl>, weekday <chr>, day <int>,
# month <chr>, year <int>, monthYear <dttm>
# A tibble: 6 × 12
timestamp currentLocation participantId currentMode
<dttm> <chr> <dbl> <chr>
1 2022-04-26 00:25:00 POINT (-2619.97535009… 486 AtHome
2 2022-04-26 00:30:00 POINT (1488.843264587… 4 AtHome
3 2022-04-26 00:30:00 POINT (-2619.97535009… 486 AtHome
4 2022-04-26 00:35:00 POINT (1488.843264587… 4 AtHome
5 2022-04-26 00:35:00 POINT (-2619.97535009… 486 AtHome
6 2022-04-26 00:40:00 POINT (1488.843264587… 4 AtHome
# … with 8 more variables: hungerStatus <chr>, sleepStatus <chr>,
# apartmentId <dbl>, availableBalance <dbl>, jobId <dbl>,
# financialStatus <chr>, dailyFoodBudget <dbl>,
# weeklyExtraBudget <dbl>
p<- ggplot(march, aes(x=monthYear, y=travelTime, group=participantId)) +
geom_line(aes(color=participantId),show.legend = TRUE)+
labs(
y= 'Average Time',
x= 'months -2022',
title = "Average time spent in Travel - 2022",
caption = "Ohio USA"
) +
theme_minimal()+
theme(axis.ticks.x= element_blank(),
panel.background= element_blank(),
legend.background = element_blank(),
plot.title = element_text(size=12, face="bold",hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.title.y= element_text(angle=0))
ggplotly(p)
plot_ly(data = data_travel,
x = ~travelTime,
y = ~timeSpent,
color = ~weekday) %>%
layout(title = 'Travel time v/s time spent')
ggplot(data_travel, aes(x = travelTime,
y = timeSpent,
size = participantId,
colour = weekday)) +
geom_point(alpha = 0.5, show.legend = TRUE) +
labs(title = 'Time spent at a place v/s travel time',
x = 'Travel time',
y = 'Time Spent') +
theme_minimal()+
transition_time(as.integer(monthYear)) +
ease_aes('linear')
participant_data = data_travel %>% group_by(weekday,month, purpose, participantId) %>%
summarise(timeSpent = mean(timeSpent))
ggplot(data= participant_data,
aes(x = timeSpent, y= month, group = month, fill = month)) +
geom_density_ridges(geom = "density_ridges_gradient",
calc_ecdf = TRUE,
quantiles = 4,
quantile_lines = TRUE,
alpha = .3) +
theme_ridges() +
scale_fill_viridis_d(name = "Quartiles")+
theme_bw()+
labs(
y= 'month',
x= 'Time Spent',
title = "Time Spent at Venue",
caption = "Ohio USA"
)+
theme(
axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0),
axis.text.x = element_text(vjust = 0.5),
legend.position = "none"
)+
facet_wrap(~participantId)
ggplot(data_travel,
aes(y = timeSpent, x = factor(month,
levels=c("Mar 22","Apr 22","May 22","Jun 22","Jul 22","Aug 22","Sep 22","Oct 22","Nov 22","Dec 22","Jan 23","Feb 23","Mar 23","Apr 23","May 23")), fill= purpose))+
geom_col(aes(group=1))+
xlab("Month")+
ylab("Time Spent")+
facet_trelliscope(~ weekday,nrow=2,ncol=2,path=".",width=800)+
theme(axis.text.x = element_text(size=6),
axis.text.y = element_text(size=6))
spending_data = data_travel %>% group_by(weekday, participantId, purpose) %>%
summarise(amountSpent = sum(amountSpent))
plot_ly(data=spending_data, x = ~purpose, y = ~amountSpent, color = ~weekday)
participant_time = data_travel %>%
mutate(weekday = weekdays(checkInTime),
day = day(checkInTime),
hour = hour(checkInTime),
participantId=as.character(participantId))%>%
na.omit()
participant_time%>%
ggplot(aes(hour,weekday,fill=travelTime))+
geom_tile(color = "white", size = 0.05)+
theme_tufte(base_family = "Helvetica")+
coord_equal() +
scale_fill_gradient(name = "Time (min)",
low = "sky blue",
high = "dark blue",
labels = comma,
na.value = 'sky blue')+
labs(x = "Weeks of the Year",
y = NULL,
title = "Average Travel Time participant") +
theme(axis.text = element_text(size = 7,margin = margin(r = -60)),
axis.ticks.y= element_blank(),
legend.title = element_text(size =12),
legend.text = element_text(size = 12),
axis.title.x = element_text(size=12),
panel.background = element_rect(fill = 'sky blue'))+
facet_col(~participantId)
logs_path <- logs_plot %>%
mutate(Timestamp = date_time_parse(timestamp,
zone= "",
format = "%Y-%m-%dT%H:%M:%S"))%>%
mutate(day=get_day(Timestamp))%>%
filter(currentMode == "Transport")
logs_path <- logs_path %>%
group_by(participantId, day) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
logs_path_4 <- logs_path %>%
filter(participantId==4)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1)+
tm_shape(logs_path)+
tm_lines(col = "blue")+
tmap_mode("plot")
logs_path_486 <- logs_path %>%
filter(participantId==486)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1)+
tm_shape(logs_path)+
tm_lines(col = "blue")+
tmap_mode("plot")
logs_4<- logs_4 %>%
mutate(time= format(as.POSIXct(timestamp), # Extract hours, minutes & seconds
format = "%H:%M:%S"),)
logs_4<- logs_4[,c("timestamp","time", "currentMode", "hungerStatus", "sleepStatus")]
logst_4 <- transpose(logs_4)